provider/akamai: return early when there are no relevant changes#6189
provider/akamai: return early when there are no relevant changes#6189endocrimes wants to merge 2 commits intokubernetes-sigs:masterfrom
Conversation
Signed-off-by: Danielle Lancashire <dani@builds.terrible.systems>
Currently, if a cluster has multiple instances of external-dns, with different domain filters (e.g across multiple providers), the Akamai provider will repeatedly re-create all of its zones records, even when there are no changes relevant to its filtered set. While this isn't inherently a problem for a single cluster controlling an entire zone, if you have many clusters updating records in a single zone, the chances of these no-op operations conflicting, failing, and resulting in the exit of external-dns is quite high. This can prevent real changes from being applied. This commit adds defensive filtering and exiting when no relevant changes are detected, based pretty heavily on the implementations of other, more active providers. Signed-off-by: Danielle Lancashire <dani@builds.terrible.systems>
Pull Request Test Coverage Report for Build 21906596659Details
💛 - Coveralls |
|
Could you have a look how it is done on AWS for example? We have a common interface - zonesToTagFilter external-dns/provider/aws/aws.go Line 393 in 633b95e Another thing, if you have any interest, are you willing to implement Zone Cache first in another PR https://github.com/kubernetes-sigs/external-dns/blob/master/provider/blueprint/zone_cache.go? Implementing cache will reduce a drift in this provider. For most of changes, we expect evidences similar to #5085 (comment) As we have no other way to validate that the feature is working. |
|
@ivankatliarchuk It was actually pretty hard to find a pattern for this - because a bunch of providers end up implementing this really subtly (by going from their configured zones -> records to update, rather than working from the changeset -> things to update). Just filtering the result of time="2026-02-11T10:34:05Z" level=debug msg="Fetched zone: 'prod.example.com' (ZoneID: REDACTED)"
time="2026-02-11T10:34:05Z" level=debug msg="Fetched zone: 'cluster.example.com' (ZoneID: REDACTED)"
time="2026-02-11T10:34:05Z" level=debug msg="Fetched '2' zones from Akamai"
time="2026-02-11T10:34:05Z" level=debug msg="Processing zones: [map[prod.example.com:prod.example.com cluster.example.com:cluster.example.com]]"
time="2026-02-11T10:34:05Z" level=debug msg="Create Changes requested [[*.ap-west.infra.notexample.com 0 IN A REDACTED [] a-*.ap-west.infra.notexample.com 0 IN TXT \"heritage=external-dns,external-dns/owner=...\" []]]"
time="2026-02-11T10:34:05Z" level=debug msg="Skipping Akamai Edge DNS creation of endpoint: '*.ap-west.infra.notexample.com' type: 'A', it does not match against Domain filters"
time="2026-02-11T10:34:05Z" level=debug msg="Skipping Akamai Edge DNS creation of endpoint: 'a-*.ap-west.infra.notexample.com' type: 'TXT', it does not match against Domain filters"
time="2026-02-11T10:34:05Z" level=error msg="Failed to create endpoints for DNS zone cluster.example.com. Error: Modification Confict: [Concurrent Zone Modification Error for cluster.example.com. Please try your request again.]"
time="2026-02-11T10:34:05Z" level=fatal msg="Failed to do run once: Modification Confict: [Concurrent Zone Modification Error for cluster.example.com. Please try your request again.]"The AWS Provider's equivalent of the filter suggested here is part of the mapping that occurs here: external-dns/provider/aws/aws.go Line 1293 in 633b95e |
ivankatliarchuk
left a comment
There was a problem hiding this comment.
The proposed PR change (early return on empty changes) only avoids fetchZones() when the changeset is literally empty. But every regular reconciliation cycle calls both
Records() (line 213) and ApplyChanges() - that's two fetchZones() API calls per loop iteration, and fetchZones hits p.client.ListZones() every time with no caching.
Proposed in separate PR to utilize blueprint.ZoneCache for provider.
The zone cache would eliminate redundant ListZones calls on every cycle (both Records and ApplyChanges would share the cache), while the early return only helps the edge case of empty changesets. They're complementary, but if I had to pick one, the cache gives far more value.
That said, both are small, independent fixes/proposal - the cache is a bigger win, and the edgeChangesByZone cleanup is still a correct bugfix regardless.
| @@ -484,3 +501,15 @@ func edgeChangesByZone(zoneMap provider.ZoneIDName, endpoints []*endpoint.Endpoi | |||
|
|
|||
There was a problem hiding this comment.
| // remove empty zones to avoid no-op API calls | |
| for zone, eps := range createsByZone { | |
| if len(eps) == 0 { | |
| delete(createsByZone, zone) | |
| } | |
| } |
There was a problem hiding this comment.
That eliminates the empty API calls at the source. Then filterEndpointsByZone and the second early return become unnecessary and can be removed. Still needs testing, could be that the unit tests are not right or smth.
| @@ -256,6 +256,12 @@ func (p AkamaiProvider) Records(context.Context) ([]*endpoint.Endpoint, error) { | |||
|
|
|||
| // ApplyChanges applies a given set of changes in a given zone. | |||
| func (p AkamaiProvider) ApplyChanges(_ context.Context, changes *plan.Changes) error { | |||
There was a problem hiding this comment.
| func (p AkamaiProvider) ApplyChanges(_ context.Context, changes *plan.Changes) error { | |
| func (p AkamaiProvider) ApplyChanges(_ context.Context, changes *plan.Changes) error { | |
| // return early if there is nothing to change | |
| if len(changes.Create) == 0 && len(changes.Delete) == 0 && len(changes.UpdateNew) == 0 { | |
| log.Info("All records are already up to date") | |
| return nil | |
| } | |
| zoneNameIDMapper := provider.ZoneIDName{} | |
| zones, err := p.fetchZones() | |
| if err != nil { | |
| log.Errorf("Failed to fetch zones from Akamai") | |
| return err | |
| } | |
| for _, z := range zones.Zones { | |
| zoneNameIDMapper[z.Zone] = z.Zone | |
| } | |
| log.Debugf("Processing zones: [%v]", zoneNameIDMapper) | |
| // Create recordsets | |
| log.Debugf("Create Changes requested [%v]", changes.Create) | |
| if err := p.createRecordsets(zoneNameIDMapper, changes.Create); err != nil { | |
| return err | |
| } | |
| // Delete recordsets | |
| log.Debugf("Delete Changes requested [%v]", changes.Delete) | |
| if err := p.deleteRecordsets(zoneNameIDMapper, changes.Delete); err != nil { | |
| return err | |
| } | |
| // Update recordsets | |
| log.Debugf("Update Changes requested [%v]", changes.UpdateNew) | |
| if err := p.updateNewRecordsets(zoneNameIDMapper, changes.UpdateNew); err != nil { | |
| return err | |
| } | |
| // Check that all old endpoints were accounted for | |
| revRecs := changes.Delete | |
| revRecs = append(revRecs, changes.UpdateNew...) | |
| for _, rec := range changes.UpdateOld { | |
| found := false | |
| for _, r := range revRecs { | |
| if rec.DNSName == r.DNSName { | |
| found = true | |
| break | |
| } | |
| } | |
| if !found { | |
| log.Warnf("UpdateOld endpoint '%s' is not accounted for in UpdateNew|Delete endpoint list", rec.DNSName) | |
| } | |
| } | |
| return nil | |
| } |
| // ApplyChanges applies a given set of changes in a given zone. | ||
| func (p AkamaiProvider) ApplyChanges(_ context.Context, changes *plan.Changes) error { | ||
| // return early if there is nothing to change | ||
| if len(changes.Create) == 0 && len(changes.Delete) == 0 && len(changes.UpdateNew) == 0 { |
There was a problem hiding this comment.
| if len(changes.Create) == 0 && len(changes.Delete) == 0 && len(changes.UpdateNew) == 0 { | |
| if !changes.HasChanges() { |
and most likely log.Debug, not Info
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
What does it do ?
Currently, if a cluster has multiple instances of external-dns, with
different domain filters (e.g across multiple providers), the Akamai
provider will repeatedly re-create all of its zones records, even when
there are no changes relevant to its filtered set.
While this isn't inherently a problem for a single cluster controlling
an entire zone, if you have many clusters updating records in a single
zone, the chances of these no-op operations conflicting, failing, and
resulting in the exit of external-dns is quite high. This can prevent
real changes from being applied.
This PR adds defensive filtering and exiting when no relevant
changes are detected, based pretty heavily on the implementations of
other, more active providers.
Motivation
Production issues with external-dns regularly crashing when it conflicts with other instances for no-op change applications.
More